@jfy133
Deoxyribonucleic acid (/diːˈɒksɪˌraɪboʊnjuːˌkliːɪk, -ˌkleɪ-/ (DNA) is a molecule composed of two polynucleotide chains that coil around each other to form a double helix carrying genetic instructions for the development, functioning, growth and reproduction of all known organisms and many viruses. - Wikipedia
Cytosine, ThymineGuanine Adenine &C with G (think: CGI)A with T (think: AT-AT walker)C on one strand, G on the other (or v.v.)A on one strand, T on the other (or v.v.)C, get new G (etc)Converting the chemical nucleotides of a DNA molecule
to
ACTG on your computer screen
Not really ‘next’ anymore, consider it more ‘second’ generation (see: Nanopore)
Market leader:
(Others: Roche 454, PacBio, IonTorrent etc.)
i.e. to a strand, attach a complementary fluorophore-modified nucleotide, (normally) one colour per base
A
G
T
C
Fire mah lazer, and take a picture! Rinse and repeat!
On a ‘flow cell’
But how do you get your DNA to attach to the lawn
(and not get lost)?
AATGATACGGCGACCACCACaccgacaaCCCTACACGACGCTCTTCCGATCTXXXXXXAGCACACGTCTGAACTCCAGTCACgacactaCCGTCTTCTGCTTG ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| TTACTATGCCGCTGGTGGTGtggctgttGGGATGTGCTGCGAGAAGGCTAGAXXXXXXTCGTGTGCAGACTTGAGGTCAGTGctgtgatGGCAGAAGACGAAC
[Adapter & Index Primer] [Index] [Target primer] [Target] [Target primer] [Index] [Adapter & Index Primer]
Once bound, florescence of one molecule not enough…
Make lots of copies, a.k.a. clustering! One cluster == many copies of one DNA molecule
Over time, imaging reagents get ‘tired’ and more errors occur
What if molecule is longer than cycles of imaging?
Improvement: paired-end sequencing
© 2021 Illumina, Inc. All rights reserved. Used here for training purposes only.
Special software (e.g. bcl2fastq):
For each location on the flow cell (cluster):
Group each recorded sequence or ‘reads’ with those with the same index
FASTQ format is a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores. Both the sequence letter and quality score are each encoded with a single ASCII character for brevity. - Wikipedia
Example
@SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=72 # Read ID
GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACCAAGTTACCCTTAACAACTTAAGGGTTTTCAAATAGA # DNA sequencing
+ # Separator
IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9ICIIIIIIIIIIIIIIIIIIIIDIIIIIII>IIIIII/ # Quality line
@SRR001666.2 071112_SLXA-EAS1_s_7:5:1:801:338 length=72
GTTCAGGGATACGACGTTTGTATTTTAAGAATCTGAAGCAGAAGTCGATGATAATACGCGTCGTTTTATCAT
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII6IBIIIIIIIIIIIIIIIIIIIIIIIGII>IIIII-I)8I
@read id
Quality score
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHI
0.2......................26...31........41